Scalable Hash-Based Estimation of Divergence Measures

نویسندگان

  • Morteza Noshad
  • Alfred O. Hero
چکیده

We propose a scalable divergence estimation method based on hashing. Consider two continuous random variables X and Y whose densities have bounded support. We consider a particular locality sensitive random hashing, and consider the ratio of samples in each hash bin having non-zero numbers of Y samples. We prove that the weighted average of these ratios over all of the hash bins converges to fdivergences between the two samples sets. We show that the proposed estimator is optimal in terms of both MSE rate and computational complexity. We derive the MSE rates for two families of smooth functions; the Hölder smoothness class and differentiable functions. In particular, it is proved that if the density functions have bounded derivatives up to the order d/2, where d is the dimension of samples, the optimal parametric MSE rate of O(1/N) can be achieved. The computational complexity is shown to be O(N), which is optimal. To the best of our knowledge, this is the first empirical divergence estimator that has optimal computational complexity and achieves the optimal parametric MSE estimation rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Improved Hash Function Based on the Tillich-Zémor Hash Function

Using the idea behind the Tillich-Zémor hash function, we propose a new hash function. Our hash function is parallelizable and its collision resistance is implied by a hardness assumption on a mathematical problem. Also, it is secure against the known attacks. It is the most secure variant of the Tillich-Zémor hash function until now.

متن کامل

Overlap-Aware Global df Estimation in Distributed Information Retrieval Systems

Peer-to-Peer (P2P) search engines and other forms of distributed information retrieval (IR) are gaining momentum. Unlike in centralized IR, it is difficult and expensive to compute statistical measures about the entire document collection as it is widely distributed across many computers in a highly dynamic network. On the other hand, such network-wide statistics, most notably, global document ...

متن کامل

Robust Estimation in Linear Regression Model: the Density Power Divergence Approach

The minimum density power divergence method provides a robust estimate in the face of a situation where the dataset includes a number of outlier data. In this study, we introduce and use a robust minimum density power divergence estimator to estimate the parameters of the linear regression model and then with some numerical examples of linear regression model, we show the robustness of this est...

متن کامل

A note on decision making in medical investigations using new divergence measures for intuitionistic fuzzy sets

Srivastava and Maheshwari (Iranian Journal of Fuzzy Systems 13(1)(2016) 25-44) introduced a new divergence measure for intuitionisticfuzzy sets (IFSs). The properties of the proposed divergence measurewere studied and the efficiency of the proposed divergence measurein the context of medical diagnosis was also demonstrated. In thisnote, we point out some errors in ...

متن کامل

Divergence times and morphological evolution of the subtribe Eritrichiinae (Boraginaceae-Rochelieae) with special reference to Lappula

The subtribe Eritrichiinae belongs to tribe Rochelieae (Borginaceae; Cynoglossoideae) which is composed of about 200 species in five genera including Eritrichium, Lappula, Hackelia, Lepechiniella, and Rochelia. The majority of the species are annual and grow in xeric habitats. The genus Lappula as an arid adapted and the second biggest genus...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1801.00398  شماره 

صفحات  -

تاریخ انتشار 2018